Finite-state Transducer Base with Explicit Modeling of Ph
نویسنده
چکیده
This article describes the design and the experimental evaluation of the first Hungarian large vocabulary continuous speech recognition (LVCSR) system. The architecture of the recognition system is based on the recently proposed weighted finite state transducer (WFST) paradigm. The task domain is the recognition of fluently read sentences selected from a major daily newspaper. Recognition performance is evaluated using both monophone and triphone gender independent acoustic models. The vocabulary units used in the system are morpheme based in order to provide sufficient coverage of the large number of word-forms resulting from affixation and compounding in Hungarian. The language model is a statistical morpheme bigram model. Besides the basic list style pronunciation dictionary model we evaluate a novel phonology modeling component that describes the phonological changes prevalent in fluent Hungarian. Thanks to the flexible transducerbased architecture of the system the phonological component is integrated seamlessly with the basic modules with no need to modify the decoder itself. The proposed phonological model decreases the error rate by 8.32% relatively compared to the baseline triphone system. The morpheme error rate of the best configuration is 17.74% in a 1200 morpheme task with test set perplexity 70.
منابع مشابه
Explicit Modeling of Phonological Changes in Finite-state Transducer Based Hungarian Lvcsr
This article describes the operation and the experimental evaluation of the pronunciation modeling component of the first Hungarian large vocabulary continuous speech recognition system. The proposed method is based on the implementation of context dependent rewrite rules by weighted finite state transducers (WFSTs). The proposed phonological model decreases the error rate by 8.32% relatively c...
متن کاملFinite-state transducer based hungarian LVCSR with explicit modeling of phonological changes
This article describes the design and the experimental evaluation of the first Hungarian large vocabulary continuous speech recognition (LVCSR) system. The architecture of the recognition system is based on the recently proposed weighted finite state transducer (WFST) paradigm. The task domain is the recognition of fluently read sentences selected from a major daily newspaper. Recognition perfo...
متن کاملPressure-Velocity Coupled Finite Volume Solution of Steady Incompressible Invscid Flow Using Artificial Compressibility Technique
Application of the computer simulation for solving the incompressible flow problems motivates developing efficient and accurate numerical models. The set of Inviscid Incompressible Euler equations can be applied for wide range of engineering applications. For the steady state problems, the equation of continuity can be simultaneously solved with the equations of motion in a coupled manner using...
متن کاملTStore: A Trace-Base Management System - Using Finite-state Transducer Approach for Trace Transformation
متن کامل
A Stochastic Finite-State Morphological Parser for Turkish
This paper presents the first stochastic finite-state morphological parser for Turkish. The non-probabilistic parser is a standard finite-state transducer implementation of two-level morphology formalism. A disambiguated text corpus of 200 million words is used to stochastize the morphotactics transducer, then it is composed with the morphophonemics transducer to get a stochastic morphological ...
متن کامل